Dataset description
There are two submissions: 10267 & 10270.
- In each submission, 2390 families with .vcf files are included.
- For each family, two vcf files are provided,
- one named “sorted”.
- the other named “annotated”.
Submission 10267
- For files named “sorted”,
- 852 families without GL/PL information
- 1537 families with valid GL/PL information
- For files named “annotated”,
- 1096 families without GL/PL information
- 1293 families with valid GL/PL information
- 309 Trios
- 984 families with Quads
Note that for FID:13562, there is no father information in the .vcf file. Also, all families with valid GL/PL information from files named “annotated” are included from files named in “sorted”.
Submission 10270
- For files named “sorted”, there is no GL/PL information.
- For files names “annotated”,
- 596 families without valid GL/PL information
- including 13 families with variants < 2000.
- 1794 families with valid GL/PL information
Combined
Note that combing 10267 & 10270, there are 2206 families with complete vcf files which were be used for further DNM analyses.
- 526 Trios
- 1680 Quads
- 1145 females, 2639 males and 102 with unknown sex information.
- corresponding to 2206 probands and 1680 unaffected siblings
- Probands: 282 females and 1869 males, 55 unknown sex information
- Siblings: 863 females and 770 males, 47 unknown sex information
Call de novo mutations
Triodenovo was used to call de novo mutations:
- Only variants with GL/PL information were retained.
- Families were splitted to Parents-Offspring trios.
- Filters: --minDP 7 --minDepth 10 and other default options
- Post filters (referred to Homsy et al. 2015 Science):
- For offsprings: a minimum 10 total reads, 5 alternate allele reads, and a minimum 20% alternate allele ratio if alternate allele reads ≥10 or, if alternate allele reads is <10, a minimum 28% alternate ratio
- For parents: alternate allele ratio <5%
The scripts are stored in /scratch/90days/uqywan67/auti_proj/SSC/scripts/call_deno.R
Annotation
- ANNOVAR was used to annotate refGene and allele frequencies.
- hg19refGene, exac03nonpsy, gnomad_exome211 databases were used.
- Based on annotation, further filtered DNMS:
- exonic or canonical splice-site variant
- MAF <= 0.001 in non-psychiatric subsets of ExAC (Header: ExAC_nonpsych_ALL in ANNOVAR), and in control samples of gnoMad databases (Header: controls_AF_popmax in ANNOVAR).
- Gene-level pLI for PTVs was downloaded from ExAC
- MPC scores for missense variants were annotated using VEP.
DNMs summary
After applying filters, a total of 4168 DNMs were found in 1770 families with 2446 offsprings.
- 3373/4168 (80.9%) DNMs were the same with published SSC DNMs from Krumm et al. 2015 and Iossifov et al. 2014.
- 351 Trios (with 612 DNMs) and 1419 Quads (with 3556 DNMs, including 1840 DNMs in 1086 probands and 1716 DNMs in 1009 siblings).
- 2837 DNMs in 1663 males, 1224 DNMs in 721 females and 107 DNMs in 62 individuals with unknown sex information.
- 2452 DNMs in 1437 probands and 1716 DNMs in 1009 siblings.
- 2831 DNMs were not presented in ExAC, 2905 DNMs were not presented in gnoMad, 2614 DNMs were not presented in both datasets.
DNM counts
Note that a cutoff 7 were used to exclude individuals with DNM counts > 7, which corresponding to the 99% quantile.

All DNMs

pLI tiers

MPC tiers

DNMs in Quads
- A total of 3556 DNMs were observed in 1419 Quads
- 1840 DNMs in 1086 probands and 1716 DNMs in 1009 siblings
- 2341 DNMs in 1376 males and 1121 DNMs in 663 females; 94 DNMs in 56 individuals without sex information.
DNM counts

All DNMs

pLI tiers

MPC tiers

Burden analysis
As noted above, a total of 526 Trios and 1680 Quads were used for further DNM analyses, which corresponding to 2206 probands and 1680 siblings in total.
- At least one DNM event was observed in 1437/2206 (65.1%) probands and 1009/1680 (60.1%) unaffected siblings.
- The DNM counts per individual were 1.11 (2452/2206) and 1.02 (1716/1680) for probands and siblings, respectively.
- For missense variants, there were 0.70 (1538/2206) DNMs per sample in probands and 0.66 (1105/1680) DNMs per sample in siblings.
- For PTVs, there were 0.10 (218/2206) DNMs per sample in probands and 0.05 (92/1680) DNMs per sample in siblings.
- For synonymous variants, there were 0.28 (625/2206) DNMs per sample in probands and 0.27 (449/1680) DNMs per sample in siblings.
Binomial exact test (two-sided) was used to assess the burden of DNMs between different groups.
- binom.test(x, n, p, alternative = “two.sided”).
- e.g. when applied to probands VS siblings, where we set x, the number of successes, to the number of proband variants; n, the number of trials, to the total number of proband and sibling variants; and p, the hypothesized probability of success, to the fraction of individuals that are probands.
Summary of significant findings in burden analyses
- Finding 1: There is a 1.09-fold enrichment of DNMs (2452 in 2206 probands versus 1716 in 1680 siblings; 1.11 versus 1.02 variants per sample; p = 7.17E-3).
- Finding 2: There is a 1.80-fold enrichment of de novo PTVs (218 in 2206 probands versus 92 in 1680 siblings; 0.099 versus 0.055 variants per sample; p = 1.56E-4).
- Finding 3: There is a 2.17-fold enrichment of pLI tier [0.5, 0.995) (37 in 2206 probands versus 13 in 1680 siblings; 0.017 versus 0.008 variants per sample; p = 1.48E-2).
- Finding 4: There is a 4.08-fold enrichment of pLI tier [0.995, ∞) (59 in 2206 probands versus 11 in 1680 siblings; 0.017 versus 0.007 variants per sample; p = 1.36E-6).
- Finding 5: There is a 1.75-fold enrichment of MPC tier [2, ∞) (106 in 2206 probands versus 46 in 1680 siblings; 0.048 versus 0.027 variants per sample; p = 1.33E-3).
- Finding 6: In each RRB cluster, there is a significant enrichment of de novo PTVs in probands versus in siblings
- Finding 7: In each RRB cluster, there is a significant enrichment of pLI tier [0.995, ∞) in probands versus in siblings.
- Finding 8: There is a 1.75-fold enrichment of MPC tier [2, ∞) between RRB clusters (41 in 594 Cluster1 samples versus 39 in 988 Cluster2 samples; 0.069 versus 0.039 variants per sample; p=1.48E-2)
- Finding 9: There is a 2.52-fold enrichment of MPC tier [2, ∞) in RRB cluster1 (41 in 594 Cluster1 probands versus 46 in 1680 siblings samples; 0.069 versus 0.027 variants per sample; p=2.51E-5)
All DNMs
Probands VS Siblings

Females VS Males

pLI tiers

MPC tiers

RRB cluster
All DNMs

pLI tiers

MPC tiers
